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Increasingly  keen  interest  has  been  manifested  in  comparatively 
recent  years  in  statistical  interpretations  of  numerical  data.  This 
interest  has  been  responsible  for  the  development  of  nxmerous  analytical 
methods  and  for  their  application  to  the  study  of  variability  in  data 
pertaining  to  many  fields. 

Present-day  statisticians,  as  well  as  practical  and  theoretical 
economists,  have  borrowed  extensively  from  contributions  made  by 
biologists  and  sociologists,  and,  among  other  uses,  they  have  appro- 
priately adapted  certain  probability  and  error  theories  in  their  measure- 
ments of  relationships  and  in  their  analysis  of  variation.    These  theories 
have  been  applied  to  sc«ne  extent  also  in  "tiie  appraisal  of  the  magnitude 
of  variances,  and  they  have  been  found  useful  in  deciding  on  the 
probabilities  of  whether  or  not  different  series  of  observations  and 
their  means  represent  unstratified  populations. 

Economists,  agronomists,  and  other  workers  have  found  it 
impossible  to  solve  many  of  their  problems  to  their  own  satisfaction 
by  the  mere  application  of  correlation  methods  and  by  the  interpretation 
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of  mriation  from  results  obtained  through  application  of  the  more 
coiDmonly^used  variability  and  dispersion  measures  •    Need  is  sometimes 
felt  for  measures  that  indicate  the  relative  magnitudes  of  contribu- 
tions made  from  different  detected  soiirces  to  total  variability.  This 
is  particularly  true  in  those  analyses  of  differences  in  paired  and 
replicate  series  of  observations  which  necessitate^  in  addition  to  a 
determination  of  the  magnitudes  of  different  parts  of  variability 
contributing  to  the  total,  an  appraisal  of  differences  both  between 
and  within  t&e  series.    It  is  for  this  reason,  at  least  in  part,  that 
present-day  statistical  analysts  have  found  it  convenient  to  employ 
more  comprehensive  technique;  in  the  interpretation  of  variation  in  their 
sample  date^^  and  in  the  evaluation  of  inherent^ or  characteristic^ 
differences  in  paired  and  replicate  series  of  observations. 

It  seems  logical  in  analyzing  the  total  variability  in  paired 
and  replicate  series  of  data  to  assume  that  the  different  sources  of 
variability  must  first  be  detected  before  any  reliable  estimate  can  be 
made  as  to  the  relative  importance  or  significance  of  the  contributing 
parts*    Unless  this  is  done,  it  is  difficult,  indeed,  if  not  is^ossible, 
to  logically  appraise  the  relative  in^ortanoe  of  the  different  parts  of 
variability.    Fortunately,  the  present  day  analyst  is  now  able  to  proceed 
in  this  way  and  thus  t£&^4ftore  c^[aprehensiv«l|r  interpret  his  data*    He  can 
do  so  either  (l)  by  separating  into  its  component  parts  the  summation 
of  squares  of  the  differences  between  individual  observations  and  their 
common  mean  emd  then  proceeding  to  calculate  the  different  parts  of 
average  squared  variability,  or  (2)  by  calculating  the  squares  of  the 


standard  deviations  by  one  of  the  ordinary  methods  and  then  isolating 
the  individual  parte  of  veiriability*    Choice  of  method  depends  some* 
vhat  upon  the  character istios  of  the  data  themselves,  the  sources 
from  Tfhich  variability  is  contributed,  and  the  use  that  is  to  be  made 
of  the  derived  results. 

It  has  been  quite  appropriately  suggested  by  Dr.  R,  A*  Fisher, 
eminent  English  statistician,  working  with  agronomic  data  at  the 
Rothamsted  Experimental  Station,  Harpenden,  England,  that  the  term 
"variance*  l/  be  applied  to  the  standard  deviation  squared,  and  this 
usage  of  the  term  has  been  quite  generally  adopted.  Investigators 
who  have  applied  the  analysis  of  variance  method  to  the  interpretation 
of  variability  have  been  especially  impressed  with  the  possibility  of 
its  more  extensive  adaptation  and  usefulness  and  the  convenience  with 
which  it  may  be  used  in  many  instances  to  analyze  squared  variability 
in  measuring  differences  between  averages  and  within  series  of  paired 
and  replicate  observations. 

In  this  report  an  analysis  is  made  of  differences  in  the 
classification  of  press-box  and  cut  sauries  of  cotton  taken  from  the 
same  bales.  Zj  The  basic  data  represent  actual  staple-length  designa- 
tions by  government  olassers  and  are  a  part  of  a  larger  mass  of  data 
compiled  and  analyzed,  a^ll  of  whinh  is-  available  foy  inspeeticiit 
Jrngether  wjb^h  OUe  nmmHI'"5.f  TrilJ  HimlyHljv.    The  two  sets  of  scunples 

1/  The  credit  for  introduction  of  the  statistical  technique  known  as 

"Analysis  of  Variance"  belongs  to  Dr.  R.  A.  Fisher  €uid  his  associates. 
It  was  published  in  1923  (Fisher,  R.  A.,  and  Mackenzie,  W.  A.,  Jour. 
Agri.  Sci.,  Vol.  IS,  Part  3,  July,  1923,  pp.  311-320),  euad  since  that 
time,  having  been  used  in  statistical  studies  of  numerous  descriptions 
has  been  found  to  be  the  most  appropriate  method  yet  suggested  for 
many  matheniatical  inquiries  into  the  probability  of  significance  of 
differences. 

2/  Press-box  samples  are  those  taken  from  the  gin  press-box  while  the 
bales  were  in  process  of  being  ginned,  whereas  cut  sfiimples  are  those 
taken  from  the  same  bales  after  they  had  been  pressed  and  tied. 
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were  classed  by  the  same  olassers  and  under  as  nearly  imiform  conditions 
as  practicable  to  maintain  at  the  time*  3/ 


The  purpose  of  the  analysis  herein  described  is  not  primarily  to  direct 
attention  to  the  basic  data,  buti^therto  illustrate  the  application  of  a 
method  of  statistical  procedure  that  has  been  successfully  used  in  the 
interpretation  of  differences  in  classification  of  press-box  and  out  samples 
of  cotton,  by  which  other  series  of  observations  falling  in  the  same  general 
category  may  be  analyzed  and  the  differences  between  them  and  within  them 
appropriately  evaluated.    An  additional  statistical  procedure  is  described  in 
this  paper  by  "wrtiich  relative  magnitudes  of  the  different  parts  of  squared 
"variability  are  logically  appraised.  3/  If  it  were  not  for  the  desira- 
bility of  separately  evaluating  the  different  component  parts  of  squared 
variability,  the  derivation  of  the  Rvalue  would  serve  the  same  purpose 
in  the  interpretation  of  differences 

2/  (cont'd.; 

During  the  1930-31  season,  the  Grade  amd  Staple  Statistics  Section  of 
the  DiTision  of  Cotton  Marketing,  U.  S.  Department  of  Agriculture, 
conducted  a  study  for  the  purpose  of  obtaining  information  on  the  pos- 
sible effects  of  gin  compression  on  staple-length  designation  of  cotton 
seimples.    A  similar  study  was  conducted  on  a  smaller  scale  during  the 
1929-30  season.    Several  office  reports  have  been  prepared  in  which  are 
presented  in  detail  different  phases  of  the  results  of  these  studies. 
The  classification  data  used  in  this  report  are  a  part  of  those  procured 
in  connection  with  the  study  made  in  1930-31. 

The  analysis  of  variance  method  and  the  procedure  which  permits  varia- 
bility to  be  separated  into  component  parts  free  from  estimates  of  error 
was  introduced  into  the  Division  of  Cotton  Marketing  in  October,  1930. 
Since  that  time  they  have  been  found  increasingly  useful  in  the  inter- 
pretation of  cotton  classing  and  other  variability. 
3/  Kemp,  W.  B.,  Jour.  Amer.  Stat.  Assn.,  Vol.  XXIX,  No.  186,  June,  1934, 
p.  147.  ^ 


between  means  as  is  served  by  the  z  value  in  the  analysis  of  variance  • 

It  is  to  be  realized  at  the  outset  that  lack  of  agreement  in 
magnitude  of  paired  observations  on  staple  length  of  potton  samples 
may  result  in  "bias"  as  well  as  the  inevitable  "spread"  in  the  distri- 
butions attributable  to  the  lack  of  agreement.    For  convenience, 
therefore,  the  former  term  is  used  to  indicate  that  part  of  the  total 
variability  contributed  by  the  net  difference,  or  difference  on  the 
whole,  between  different  series  of  observations*    It  is  obvious,  then, 
that  the  bias  is  necessarily  represeiited  in  the  "spread"  and  is  a  peirt 
of  it.    Usage  of  the  term  "bias"  is  not  to  be  understood,  ho?r©ver,  as 
implying  that  observations  in  one  series  differ  consistently  from  those 
in  another  series. 

/  The  bias  may  quite  generally  account  for  only  a  part  of  the 
variability  caused  by  lack  of  agreement  in  magnitude  of  paired  observa- 
tions, so  that  one  of  the  principal  problems  in  analysing  differences 
in  classing  is  to  make  an  appraisal  of  the  bias  in  order  that  there 
may  also  be  available  a  measure  of  the  variability  attributable  to 
"compensating  tendencies."    To  this  latter  measure  there  is  herein 
applied  the  term  "error,"  which,  in  some  respects  a  residual,  is 
indicative  of  the  variability  within  series  of  paired  observations  that 
is  contributed,  in  addition  to  bias,  by  the  failure  of  these  observa- 
tions to  agree  in  magnitude. 

The  necessity  for  separately  evaluating  bias  and  error  as 
interpreted  in  this  discussion  is  fully  realized  when  an  attempt  is 
made  to  compare  the  measures  of  variability  contributed  from  the  two 


80iirces»    It  would  not  be  logical,  of  course,  to  express  the  bias 
and  error  in  terms  ^  different  from  that  representing  "spread"  and  then 
attempt  comparisons.    Furthermore,  the  "spread"    inel'udes     'both  bias 
and  error,  and  the  two  must  be  treated  as  distinct  and  separate 
measures  in  order  to  show  the  magnitude  of  each  and  to  indicate  their 
relative  importance.    The  analysis  of  variance  method  is  used  in 
determining  probability  of  significance  of  differences  between 
measures  of  average  squared  variability,  and  the  method  suggested  by 
Dr.  Kemp  is  used  in  showing  the  relative  magnitudes  of  the  component 
parts  of  total  squared  variability.    Technique  involved  in  the  appli- 
oation  of  these  methods  is  illxist rated     by    the  analysis  herein  made. 

The  following  equations,  4/  representing  technique  that  has 
been  applied  in  studying  the  differences  in  classifications  of  press* 
box  and  cut  samples  of  cotton  taken  from  the  same  bales,  will  indi- 
cate the  nature  of  the  calculations  and  the  fundamental  mathematical 
principles  underlying  them.  5/ 

1.  Correction  factor  (c)  =  Z:(x  +  y)  Y)t  or  ^  yJ^  in 

2n  Zn 

which  "x"  and  "y"  represent  the  individual  observations  in  the  two 
series,  and  **n**  the  number  of  observations  in  MiiHaiif  the  series, 
which  is  the  same,  of  course,  as  the  number  of  paired  observations 
when  each  is  present  in  duplicate. 

2.  Total  summation  of  squares,  or  total  squared  variability, 

4/  The  analytical  technique  represented  by  these  equations  has  been 
applied  also  to  studies  of  differences  in  classifications  of 
identical  samples  by  two  or  more  classers  and  to  studies  of 
differences  in  classifications  of  different  samples  from  the  same 
bales.    The  method  is  applicable  in  some  instances  to  the  interpreta- 
tion of  variation  in  prices  and  other  economic  data. 

5/ Another  method  (tmpublished)  of  analysis,  furnishing  comprehensive 
interpretations  of  variation  and  mean  differences,  has  been  suggested 
by  0.  T.  Weaver  and  successfully  used  by  him. 
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3«    Bias,  or  squared  -variability  attributable  to  the  difference 


between  the  summations  and,  consequently,  the  means  of  x  and  y, 

n        '  n 

4»  •Sample*'  variability,  or  the  squared  variability  attributable  to 

differences  in  magnitudes  of  successive  x  f  y  summations,  or  to 

accepted  differences  jritkin    the  e;^eries  .         upon  which  the 

observations  are  made,  which  would  necessariljr  result  in 

differences  in  magnitudes  of  successive  x  f  y  stimiBations, 
s  -^U  f  y)^  , 


5*  Error  «  total  squared  variability  (equation  2)  -  s€unple  variability 

(equation  4)  -  bias  (equation  3) 

The  data  in  table  1,  in  which  there  is  an  average  difference  of 

0.7  of  one-sixteenth  of  an  inch  between  the  two  series  of  paired 

observations,  are  compared  and  the  differences  between  and  within  the 

series  interpreted.    Application  of  the  equations  is  made  in  the 

interpretations,  and  the  calculations  suggested  by  them  are  illustrated* 

Table  1.-  Distribution  of  staple -lengfth  designations  of 
10  paired  press-box  and  cut  samples  of  cotton  l/ 


Sample  s 

Staple-length  designatiom 
(sixteenths  of  an  inch) 

• 
■ 

8 

3t  +  y 

number  a 

X  : 

(press-box  samplef')  s 

7 

(out  sailed) 

8 
9 

1  : 

13  : 

12 

« 

• 

25 

2  : 

13  t 

15 

S 

28 

3  1 

14  t 

15 

8 

29 

4  i 

14  t 

14 

8 

28 

5  : 

15  s 

17 

• 
• 

32 

6  ; 

16  t. 

17 

8 

S3 

7  : 

16  t 

16 

8 

32 

8  : 

17  I 

16 

8 

33 

9  ] 

18  t 

20 

8 

38 

10  : 

19  •  : 

20 

8 

39 

Total  t 

155  t 

162 

• 
• 

317 

Mean  2/  } 

15  »5  t 

16.2 

S 

15  •85 

the  gin  press-box  while  the  bales  were  in  process  of  being  ginned, 
whereas  cut  samples  are  those  taken  from  the  same  bales  after  they 
4iad  been  pressed  and  tied* 

(continued  on  next  page) 
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zj  Calculated  for  comparative  purposes  on  the  Icwer  limits  of  the  staple- 
length  groups.    Means  based  on  midpoints  may  be  obtained  by  adding 

z/  Representing  the  common  mean  of  the  two  series  of  observations. 


-  9  - 

1.  Correction  factor  =  317  x  16.85  s  5024.45 

2.  Total  squared  variability  s  2441      2680  -  5024.45  =  96.55 

3.  Bias,  or  X  -  y  squared  mr lability,  Z    (^^^^^  ^    (3-62)^  . 
5024.45  =2.45 

4.  *'Sajiiple'^  variability  =    ^^^25    .  5024.45  s  88.05 

2 

5.  Error  =  96.55  -  88.05  -  2.45  =  6.05 

Correction  factor.  -    This  value  is  determined,  as  the  equation 
indicates,  by  obtaining  the  product  of  the  summation  of  observations 
and  their  mean,  which  is  the  equivalent  of  the  quotient  resulting  from 
dividing  the  square  of  the  summation  of  observations  (square  of  317) 
by  the  number  of  observations  (20).    It  represents  the  difference 
between  the  summation  of  the  squared  observations  and  the  siimmation  of 
the  squares  of  deviations  from  the  common  mean,  15.85,  and  it  may  be 
expressed  as  follows s 

Correction  factor  s  ^Ix^  ^fy^    -  ^(x  -  x  t  y)^  «f  21(y  -  x  +  y)?L 
in  which  *  +  jT  represents  the  common  mean  of  the  x  and^  observations. 

Total  squared  variability.  -  This  summation  of  squares,  as  calcu- 
lated, is  the  difference  between  the  summation  of  the  squares  of  all 
observations  and  the  correction  factor,  the  correction  factor  being 
subtracted  because  it  constitutes  the  difference  between  the  summation 
of  squares  of  individual  observations  and  the  summation  of  squares  of 
deviations  from  the  common  mean.    It  is  the  equivalent  of  the  total  of 
the  squares  of  deviations  of  all  observations  in  the  two  series  from 

the  common  meaji,  as  the  following  equation  indicates:  Total  squared 

s~  2  2 

variability  =  ^  (x  -  x  f         f  Z^(y  -  x  f  y)  ,  in  which  x  and  y  are 
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individual  observations  and  xjy    the  coiimon  mean#    The  measiire  of 
total  sqiaared  variability  may  be  obtained  by  squaring  the  deviations 
of  individual  observations  in  table  1  from  their  coBMon  mean,  16.85,  and 
then  summating  the  squares*     ^  . 

Wtien  paired  olDservations  are  of  the  same  magnitude ^  the  total  of 
their  squares  is  one-half  as  great  as  the  square  of  their  summation  and 
equal  to  the  product  of  their  meaii  and  summation.    "VWien  paired  observa- 
tions are  not  of  the  same  magnitude,  the  total  of  their  squares 
exceeds  one-half  the  square  of  their  summation  by  an  amount  equal  to 
one-half  the  square  of  the  difference  between  the  two  observations; 
ji|p&  it  exceedsthe  product  of  their  mean  and  summation,  also,  to 

the  extent  of  one-half  the  square  of  this  difference. 

Bias.  -   As  will  be  observed  by  the  equation  already  presented, 
and  by  the  calculations  following  table  1,  this  measure  of  variability, 
containing,  as  derived,  one  part  of  error  (0.6722,  column  4  of  table  2)^ 
since  there  is  one  degree  of  freedom  (column  2  of  table  2)^  is  contri- 
buted by  differences,  on  the  whole,  between  the  series.    In  the  calcu- 
lations presented,  it  is  the  quantity,  plus  indicated  error,  that 
remains  after  the  correction  factor  has  been  subtracted  from  the 
summation  of  the  quotients  obtained  by  dividing  the  squares  of  the 
summations  of  both  the  x  and  y  ,  series  by  the  number  of  observations 
in  each  series. 

TWhen  calculated  by  the  method  described,  it  is  the  equivalent 
of  the  product  of  the  total  number  of  observations    and  the  square  of 
one-half  the  difference  between  the  mean  of  the  x  series  of  observations 
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and  the  mean  of  the  y  series »  one-half  of  the  difference  between 

these  two  means  being  the  same  as  the  total  difference  between  the  g 

common  mean  and  the  mean  of  each  series ♦    This  may  be  expressed  by  J 

the  following  equation*  :g 

Bias  z  2n^  ^  2  ^       *  in  which  "n",  as  already  stated,  repre-  ^ 
paired  o 
sents  the  number  of/observations.  '     >  5 

The  calculated  meajas  of  the  two  series  of  observations  shown  5 

in  table  1  are  15.5  and  16.2,  respectively,  the  difference  between  '3 

si 
t 

them  being  0,7,  the  equivalent  of  the  arithmetic  mean  of  the  algebraic  § 


o 

summation  of  the  differences  between  the  paired  observations.  It  is  ^ 
obvious  that  the  algebraic  summation  of  differences  between  individual  ^ 

& 

09 


observations  in  both  series  and  their  common  mean,  15.86,  is  necessarily 
0. 


o 

X! 
+> 

o 

Since  there  is  an  average  difference  of  0.7  betwe  en  the  two  serxes  4> 

o 

of  paired  observations,  the  quantity  2.45  can  be  obtained  also  by  o 

u 

a. 

squaring  this  average  difference,  multiplying  by  the  number  of   paired  © 

si 
+> 

observations,  and  then  dividing  by  2.    It  will  be  realized,  of  course,  a 

that  the  product  of  the  square  of  this  average  difference  and  the  "JJ 

u 

number  of  paired  observations  is  twice  as  great  as  the  product  of  the  ^ 

^^^^i4_ih^eJbotal  nu^^^  of_x_and  vgbser  vat  ions,  and  that  it  is  four  times  g^j 
square  of  one-half  the  average^iff erinoe/and  the  number  of  paired 

observations.    Comprehension  of  these  relationships  between  mgnitudes 

of  products  makes  it  possible  to  comprehend  more  readily  the  measure 

of  bias. 

Sag^le  variability.  -    This  measure,  containing  parts  of  error 
(0.6722)  as  indicated  by  the  degrees  of  freedom  in  table  2,  is  calcu- 
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lated  in  the  analysis  of  data  in  table  1  by  squaring  the  summations 
of  pairs  of  obserrations,  summating  the  squares,  dividing  by  2,  and 
then  subtracting  the  calculated  correction  factor.    Sample  variability 
as  herein  calculated  is  contributed  by  differences  in  the  magnitudes 
of  successive  x  f  y  summations  (See  table  l)»    These  differences  in 
X  -f  y  summations  are  present  nihen  successive  observations  in  the 
individual  series  are  of  unlike  magnitude  ajad  paired  observations 
are  in  perfect  agreement^  and  they  may  be  present  either  when  success- 
ive observations  are  of  unlike  magnitude  and  paired  observations  are 
not  in  perfect  agreement,  or  when  successive  observations  in  one  series 
are  of  like  magnitude  and  corresponding  observations  in  the  paired 
series  are  not  in  agreement. 

The  measure  may  be  calculated  also  by  squaring  the  deviations 
of  observations  in  each  individual  series  from  their  mean,  summating 
the  squares,  and  subtracting  the  error,  which,  as  will  be  shown  by  the 
following  explanation,  may  be  calculated  independent  of  the  correction 
factor,  total  squared  variability,  euid  "sample"  variability. 

Error,  -    For  purposes  of  the  analysis  of  variance,  a  measure 
of  error  may  be  obtained  by  subtraction,  it  being  the  quantity  remain- 
ing, as  indicated  by  the  equations  and  by  the  calculations  following 
table  1,  after  measures  of  variability  calculated  for  bias  and  "sample" 
have  been  subtracted  from  total  squared  variability.    Error,  ^fe^tt^M!^^ 
di^^>'^\^iKtjl^7lXB(%±fi^  ^.^s  is  contributed  by  lack  of  agreement  in  the 

magnitude  of  observations,  and  it  is  in  addition  to  bias,  or  net 
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differenoe,  which  is  also  contributed  by  lack  of  agreements  6/  It  will 
be  understood,  of  ootirse,  that  there  is  no  bias  unless  the  differences 
in  series  of  obser"7Etion8  are  non-compensating,  so  that  the  summation 
and  average  of  one  series  are  larger  or  smaller  than  the  summation  and 
average  of  another.  * 

Problems  often  confronting  the  analyst  when  observations  are  not 
in  agreement  in  respect  to  magnitude  and  when  averages  of  series  differ 
are,  as  already  indicated,  the  determination  of  the  probability  of  sig- 
nificance of  differences  between  variances,  and  the  separation  of  total 
squared  variability  into  its  component  parts ♦  These  are  essential  if 
differences  are  to  be  properly  evaluated  and  if  the  most  desirable  com- 
parisons are  to  be  made  betv/een  the  different  parts  of  squared  variability* 

In  applying  the  analysis  of  variance  method,  the  desired  measure 
for  error  may  be  determined  independent  of  the  correction  factor,  total 
squared  variability,  and  "sample"  variability.    (A  convenient  advantage 
is  thus  afforded  in  checking  computations.)    This  calculation  is  made  by 
determining  the  difference  between  magnitudes  of  each  pair  of  matched 
observations,  sqimring  these  differences,  dividing  by  2,  summating,  eind 
then  subtracting  the  bias.    Instead  of  dividing  the  square  of  each  difference 
by  2  and  then  summating,  the  squares  may  be  summated  and  the  total  divided 
by  2. 

With  the  total  squared  variability  separated  into  designated 
component  parts,  each  containing  as  many  parts  of  average  squared 

6/  It  has  been  observed  in  the  classif  ication  of  press-box  and  cut  sauries 
representing  the  same  bales,  and  also  in  the  classification  of  identi- 
cal samples  by  the  same  and  different  classers,  that  there  is  frequently 
inconsistent  variation  in  the  distribution  of  staple-length  observations. 
Tolerance  in  classing  may  possibly  account  for  an  appreciable  part  of 
this  inconeistenoy,  and  the  differences  themselves  between  magnitudes  of 
paired  staple-length  observations  'may  be  due  in  psirt  to  actual  differences 
dn  the  cotton  upon  which  observations  are  made. 
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variability  for  error  as  are  indicated  by  the  corresponding  degrees  of 
freedom,  a  further  calculation  consists  of  determining  whether  one 
estimate  of  squared  variability  obtained  from  n^^  degrees  of  freedom  ij 
differs  significantly  from  another  estimate  of  squared  variability 
obtained  from       degrees  of  freedom*    Fortunately,  the  problem  is 
simple,  it  being  only  necessary  to  calculate  the  z  value  8/  equal  to 
half  the  difference  between  the  natural  logarithms  of  the  two  derived 
measures  of  average  squared  variability,  or  to  the  difference  between 
the  natural  logarithms  of  the  corresponding  standard  deviations  (i.e., 
the  difference  between  the  natural  logarithms  of  the  square  root  of  the 
measures  of  average  squared  variability) • 

The  values  in  column  4  of  table  2  represent  the  calculated  averages 
of  squared  variability,  obtained  by  dividing  the  squared  measures  in 
column  3  by  the  degrees  of  freedom  in  column  2.    Then,  as  Fisher  has 
explained,  if  P  represents  the  probability  of  exceeding  the  calculated 
z  value  by  mere  chance,  it  becomes  possible  to  obtain  the  value  of  z 
corresponding  to  different  values  of  P,  n^,  and  ng,  9/ 

ij  The  term  "degrees  of  freedom"  is  used  in  stating  the  number  of  series  of 
observations  or  the  number  of  observations  within  series  that  may  be 
free  to  vary  from  any  single  series  or  observation.    In  Fisher's  table 
of  5  percent  points,  n^  corresponds  to  the  larger  variance  as  calculated 
in  tables  2  and  4* 

8/  The  distribution  of  this  z  value  is  closely  related  in  principle  to  the 
distribution  of  z  values  worked  out  by  ''Student"  and  Pearson*  Fisher's 
z  value  is  equal  to  one-half  of  the  natural  logarithm  of  the  quotient 
obtained  by  dividing  one  average  squared  variability,  such  as  presented 
in  column  4  of  table  2,  by  another.    It  is  calculated  in  this  report  by 
determining  the  difference  between  one-half  the  natural  logarithms  of 
two  derived  estimates  of  variance,  which  is  the  equivalent  of  one-half 
the  difference  between  two  such  logarithms.    A  comprehensive  explanation 
of  analytical  procedure  is  presented  by  C.  H.  Goulden  in  a  paper  entitled 
"Application  of  the  Variance  Analysis  to  Experiments  in  Cereal  Chemistry" 
(Cereal  Chemistry,  Vol,  IX,  No.  3,  May,  1932,  pp.  239-260). 

9/  Readers  may  be  interested  in  an  article  by  T.  Eden,  Tea  Research  Institute 
of  Ceylon,  and  F.  Yates,  Rothamsted  Experimental- Station,  entitled  "On 
the  Validity  of  Fisher's  Z  Test  V?hen  Applied  to  an  Actual  Example  of  Non- 
normal  Data  "  (Jour.  Agr.  Sci.,  Vol.  23~,  Part  1,  January,  1933,  pp. 6-17.) 
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The  following  table,  based  on  table  1  and  the  results  of  the 
analysis  of  data  contained  therein,  illustrates  the  procedure  by  iwhich 
it  may  be  determined  whether  or  not  the  measure  of  average  squared 
variability  obtained  from  n^  degrees  of  freedom  is  significantly  gres.ter 
than  that  obtained  from       degrees  of  freedom. 


Table  2,  -  Sources  of  variability,  degrees  of 
freedom,  and  measures  of  variability 
contributed  from  specified  detected 
sources 


1  •  2*3  '4*5 

 S   g  ^  ^  ^  t   8_  _^  

:  :  Squared  :  Average  : 

Source  of  :  Degrees  :  variability  :  squared         s  J  ^^^'b 

squared  t  of  s  (summation  :  variability  t  Zj 

variability       j  freedom  x  of  squares)  s         l/  s 

:  :  :  ,  t 

•  •  •  • 

•  •  •  •  ' 

Bias  :  1  t  2,45  :      2.4500        j  0.4481 

Sample  :  9  :         88.05  :      9.7833        :  1.1403 

Error  x  9  3  6.05  :        .6722        :  -.1986 


Total  J  19         J  96.56 


\j  Squared  variability  divided  by  degrees  of  freedom. 

Zj  J  log.0  equals  J  log •10  ^^'^^  2.3026,  or  log.j^Q  "times  1.1513.  These 
values  were  calculated  by  obtaining  the  products  of  2.3026  and  one- 
half  the  common  logarithms  of  the  numbers  in  column  4.    See  footnote 
2,  table  4. 


(    t   ♦       f=   e.  6 
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Column  6  of  the  table  shows  the  one-half  natural  logarithms  of 
the  averages  in  column  4«    For  the  bias  measure  of  2.45,  containing 
1  part  of  error  (0.6722),  the  J  log.^  value  is  0,4481,  and  for  error 
alone  the  §  log.Q  value  is  -0.1986.    The  difference  between  0.4481 
and  -0.1986  is  0.6467,  which  is  the  z  value  corresponding  thereto,  or 
to  bias  free  from  the  1  part  of  error.    To  avoid  negative  logarithms, 
this  difference  can  be  calculated  in  this  instance  by  moving  the 
decimals  in  column  4  one  place  to  the  right. 

In  the  table         showing  5  percent  points  of  the  distribution 
of  2,  with  n^^  equaling  1  and  n2  equaling  9,  the  z  value  is  0.8163, 
indicating  that  a  value  of  z  as  great  as  or  greater  than  0.8153  would 
be  expected  to  be  obtained  by  chance  alone  in  not  more  than  5  percent 
of  the  number  of  cases.    It  is  apparent,  then,  that  the  z  value 
coinciding  with  the  5  percent  point  in  the  distribution  of  z  values 
is  the  equivalent  of  the  calculated  odds  of  19  to  1. 

"Whenever  a  derived  z  value  is  smaller  than  that  occurring  at 
the  5  percent  point,  it  would  seem  logical  to  conclude  that  the  odds 

lO/  Fisher,  R.  A.,  Statistical  Methods  for  Research  Workers,  fourth 
edition,  1932,  table  VI,  pages  224  and  225.    See  pages  226  and  227 
for  1  percent  points  of  the  distribution  of  z.    Readers  who  wish 
to  estimate  the  reliability  of  very  small  samples  may  find  it 
convenient  to  refer  to  a  table  prepared  by  ''Student*'  (Metron  V., 
No.  3,  1926,  pp.  105-112.) 

The  necessity  for  removing  effects  of  correlation  in  paired  data 
is  emphasized  by  Dr.  W.  B.  Kemp  in  an  enlightening  paper  entitled 
"The  Reliability  of  a  Difference  Between  Two  Averages**  (Jour.  Amer. 
Soc.  of  Agronomy,  Vol.  16,  No.  6,  June,  1924,  pp.  359-362.) 
Correlation  relative  to  formulas  for  error  is  discussed  by  "Student" 
in  a  paper  entitled  "On  Testing  Varieties  of  Cereals"  (Biometrika, 
Vol.  15,  parts  3  and  4,  December,  1923,  pp.  271-293),  and  F.  D. 
Rickey,  in  a  paper  on  "Adjusting    Yields  to  Their  Regression  on  a 
Moving  Average  as  a  Means  of  Correcting  for  Soil  Heterogeneity" 
(Jour.  Agri.  Research,  Vol.  27,  No.  2,  January  12,  1924,  pp.  79-90) 
discusses  correlation  in  paired  data. 


are  less  than  19  to  1  that  the  difference  being  considered  is  significant, 
or  less  than  19  to  1  against  the  difference  being  due  to  chance  alone. 
Since  our  calculated  z  value  is  only  0,6467,  which  is  appreciably  less 
than  the  5  percent  value,  the  magnitude  of  the  difference  considered  is 
not  regarded  as  being  significant  from  this  standpoint.    As  will  be 
indicated  later,  however,  considerable  importance  might  be  attached  to 
the  fact  that  the  differences  occur,  and  the  existence  of  bias,  or  net 
difference,  might  be  of  special  interest  in  a  further  study  of  the 
possible  causal  relationship  be"tween  gin  compression  and  staple-length 
designation  of  cotton  sajnples. 


X  and  y  are  the  means  of  the  x  and  y  series  in  table  1,  and  in  which 


deviation  squared  of  the  differences  between  paired  observations,  1,9094 
is  obtained,  which  is  less  than  the  value  indicating  significance  in 
Fisher's  table  of  t,  ll/ 

If  the  z  value  occurred  far  beyond  the  5  percent  point  in  the 
table  of  z  value  distributions,  the  logical  conclusion  would  be  that 
the  difference  under  consideration  is  markedly  significant.  According 
to  the  distribution  of  z  values,  it  would  be  expected  that  the  z  values 
in  95  percent  of  the  oases  would  be  less  than  0,8163  when    n^  equals  1 
and  n2  equals  9,    It  is  seen,  therefore,  that  the  calculated  z  value 
of  0»6467  would  be  expected  to  occur  in  this  95  percent  group. 

ll/  Fisher,  R,  A.,  Statistical  Methods  for  Research  \jorkers,  fourth 
edition,  1932,  table  IV. 


Calculating  the  t  value  by  the  formula 


correction  is  made  for  degrees  of  freedom  in  deriving  the  standard 
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Another  table,  which  has  already  been  referred  to  in  footnote  10, 
shows  1  percent  points  of  the  distribution  of  z.    With  n^  equaling  1 
and  n2  equaling  9,  the  1  percent  point  is  found  to  be  1,1786,  indicating 
that  a  z  value  as  great  as  or  greater  than  1,1786  would  be  expected  to 
be  obtained  by  chance  alone  in  not  more  than  1  percent  of  the  cases, 
and  that  99  percent  of  the  z  values  would  be  expected  to-be  smaller 
than  the  value  of  1.1786  shown  in  the  table.    If  a  derived  z  value 
corresponds  to  that  shown  at  the  1  percent  point,  the  odds  are  99  to  1 
that  the  difference  being  considered  is  significant.    If  the  derived 
z  value  is  greater  than  that  occurring  at  the  1  percent  point,  the  odds 
are  more  than  99  to  1  that  the  difference  is  significant.    The  lack  of 
significance  of  such  differences  does  not  necessarily  mean,  of  course, 
that  no  importance  is  to  be  attached  to  differences  in  contributions 
made  to  total  squared  variability  from  the  different  sources  detected. 

The  use  of  the  tables  of  z  distributions  must  proceed  in  this 
analysis  with  a  complete  understanding  that  n^  has  reference  to  the 
Isirger  measure  of  average  squared  variability,  or  variance,  and  that 
ng  refers  to  the  smaller  measure.    It  must  be  realized  also  that  when 
the  number  of  observations  is  small,  the  difference  between  calculated 
measures  of  average  squared  variability  may  be  quite  large  before  they 
become  statistically  significant,  and  that  as  the  number  of  degrees  of 
freedom  increases  smaller  differences  may  be  expected  to  indicate 
significance.    This  applies  also,  of  course,  to  certain  common  probability 
measures , 

It  will  be  observed  by  reference  to  column  4  of  table  2  that  the 
measure  for  bias  is  2,4500i  and  that  for  error  it  is  0.6722.    Column  2 


shows  the  corresponding  number  of  degrees  of  freedom.    The  measure  for 
••sample"  is  9.7833,  with  9  degrees  of  freedom,  but  the  problem  of 
determining  the  probability  of  significance  of  differences  has  been 
concerned  only  with  the  measures  for  bias  and  error*    If  the  total 
squared  variability  were  separated  into  only  two  component  parts,  we 
would  have  1  degree  of  freedom  for  the  difference  between  series  and 
18  for  differences  within  series. 

In  coluimn  3  of  table  2,  as  already  indicated,  1  part  of  error, 
0.6722,  may  be  understood  as  being  contained  in  the  measiire  2.45  in 
column  3,  and  9  parts  in  the  measure  88.05.    To  obtain  percentages 
representing  estimates  of  the  proportionate  contributions  made  from 
the  different  detected  sources  of  variability  it  is  first  necessary  to 
free  these  measures  from  error.    The  total,  96.55,  is  then  divided  into 
the  recalculated  parts  of  variability.    Thus  we  have  96.65  divided  into 
1.78,  [or  2.45  minus  (1  times  0.6722)],  82.00,  ^r  88.05  minus  (9  times 
0.6722)],  and  12.77,  ^r  6.05  'I- 9  times  0.6722  plus  1  times  0.6722)3#  to 
obtain  the  proportionate  contributions  made  to  total  squared  variability 
by  bias,  sample,  and  error.    The  percentages  obtained  by  this  procedure 
are  1.84  for  bias,  84.93  for  sample,  and  13.23  for  error.    In  respect 
to  magnitude  of  contributions  made  to  total  squared  variability,  error 
is  much  more  important  than  bias,  indicating  that  the  inconsistent 
"spread"  in  the  classification  of  samples  represented  in  table  1  contri- 
buted more  to  total  squared  variability  than  did  the  net  difference 
between  the  two  series  of  staple-length  designations. 

It  has  already  been  observed  that  about  95  percent  of  the  z 
values  pertaining  to  such  populations  or  universes  for  which  n^^  is  1 
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and  n2  is  9  may       expected  to  be    smaller  than  the  value  of  0*3162  shown 
in  the  z  table  referred  to.    Therefore,  the  difference  between  the 
Variances  of  3*4500  and  0*6722  designated  as  bias  and  error,  respectively, 
in  table  2  is  not  considered  significant  when  interpreted  from  the  table 
inasmuch  as  the  calculated  z  value  of  0*6467  is  less  than  the  z  value 
coinciding  with  the  5  percent  point* 12/ 


12/  The  analysis  of  variance  method  is  admirably  adaptable  to  the  study 
of  bias  and  error  in  paired  and  replicate  classifications  of  iden» 
tical  samples  by  any  nmber  of  classers,  and  it  has  been  found  useful 
for  this  purpose*    It  is  the  only  method  yet  suggested  that  provides 
for  the  determination  of  significance  of  differences  between  varian- 
ces attributable  to  bias  and  error,  or  to  any  other  two  sources, 
and  at  the  same  time  makes  it  possible  by  a  modlf icaticsi  of  the 
analytical  approach  to  provide  a  measure  of  total  squared  variability 
and  measures  of  component  parts,  free  from  estimated  error,  that 

for  some  purposes  be  logically  compared  with  one  another. 
A  method  suggested  by  0*  T*  Weaver  provides,  however,  for  the 
elimination  of  the  effects  of  modal  tendencies  of  errors  of  obser- 
vation (See  footnote  5*) 


-  El  - 

The  difference,  as  already  observed,  between  the  m^ns  of  the 
X  €Lnd  y  series  of  observations  in  table  1  is  0,7,  representing  the 
extent  in  sixteenths  of  an  inch  to  which  the  y  series  is  greater,  on 
the  whole,  than  the  x  series.    This  difference,  a^  indicated,    is  the 
same  as  the  average  of  the  algebraic  summation  of  deviations  of  y 
observations  from  the  x  observations,    A  principal  advantage  in 
separating  into  its  component  parts  the  total  summation  of  squared 
deviations  from  the  common  mean,  which  has  already  been  illustrated, 
is  that  the  derived  measures  readily  lend  themselves  to  comparison 
and  interpretation,  which  may  not  be  possible  if  the  calcule.ted  measures 
are  in  unlike  terms. 

The  difference  of  0.7  between  the  two  means  is  obviously  not 
directly  comparable  with  the  average  of  squared  differences  between 
paired  observations,  nor  is  the  average  difference  between  the  two 
means  comparable  with  the  average  of  squared  deviations  from  the  common 
mean  of  the  two  series.    This  is  attributable  not  only  to  the  fact  that 
this  difference  between  the  means  of  the  two  series  is  in  linear  terms 
and  cannot  be  directly  compared,  therefore,  with  squared  measures,  but 
also,  and  perhaps  chiefly,  to  the  fact  that  the  differences  between 
paired  observations  contribute  the  entire  ^spread**  or  dispersion,  which 
represents  both  the  bias  and  the  error,  the  latter  having  reference  to 
that  part  of  the  ''spread"  or  dispersion  which  is  in  addition  to  the  bias 
or  net  difference  between  the  series. 


In  order  to  show  results  of  the  smalysis  of  mrianoe  when 
applied  to  a  larger  group  of  data  than  that  presented  in  table  1, 
and  in  order  to  further  illustrate  the  practical  application  of  the 
analytical  technique,  two  additional  series  containing  much 
greater  numbers  of  observations  are  presented  in  table  3.    The  basic 
data  contained  in  the  first  two  columns  of  this  table  represent 
the  actual  classifications  of  press-box  and  out  samples  of  cotton 
taken  from  the  same  bales  and  classed  by  the  same  classers  and  under 
as  nearly  uniform  conditions  as  practicable  to  maintain  at  the  time. 
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Table  3.  -  Distribution  of  staple -length 

designations  of  16,977  paired  press- 
box  and  cut  samples  of  cotton 


Staple -length  : 
designation  l/  t 
(sixteenths  of  :     x  +  y 
an  inch)  t 

:  : 
I  : 

:  Frequency  j 
»      (n)  4/  i 

!         yn  s 

j 

:  (x  4  y)n 

I 

13  J 

.  13 

i 

I  26 

:          499  i 

I      6,487  1 

t      6,487  : 

!  12,974 

13  J 

!  14 

s  27 

t           742  : 

9,646      :    10,388  : 

20,034 

13  J 

!  16 

:  28 

5                 75  3 

I         975      i      1,125      2  2,100 

13  s 

!  16 

:  29 

s             12  ; 

{          156      2          192      2  348 

14  i 

i  13 

{  27 

t           303  J 

!      4,242  J 

5      3,939      2  8,181 

14  J 

I  14 

:  28 

s        5,959  . 

!     83,426  ! 

!    83,428      2  166,852 

14  1 

!  15 

t  29 

J        1,744  1 

{     24,416  ! 

{     26,150      2  50,576 

14  ! 

t  16 

I  30 

:           179  : 

t      2,506  ! 

t      2,864      2  5,370 

14  i 

i  17 

(  31 

:             18  1 

5          252      2          306      2  558 

14  1 

!  18 

I  32 

:              2  i 

I           28      2           36      2  64 

15  J 

:  13 

:  28 

i             14    2         210      :         182      :  392 

15  1 

:  14 

:  29 

t        1,177  i 

t    17,655  s 

I    16,478      2  34,133 

16  1 

I  16 

3  30 

z        2,612  J 

s    39,180  j 

5    39,180  ! 

I  78,360 

15  i 

t  16 

I  31 

s           636  i 

!      9,540  i 

!     10,176  1 

!  19,716 

15      t  17 

1     *  32 

i            25  3 

I          375  i 

I          425  i 

5  800 

15  i 

!  18 

:  53 

:              2  3 

t           30  ! 

t           36  1 

t  66 

16  ! 

I  13 

:  29 

s              4  : 

i           64  1 

i           52  1 

s  116 

16  ! 

t  14 

I  30 

:            80    :      1,280      :      1,120  i 

:  2,400 

16      J  15 

i  31 

:           420  ! 

{      6,720  J 

!  6,300 

:  13,020 

16  ! 

t  16 

!  32 

:       1,814  1 

5    29,024  ; 

t  29,024 

:  58,048 

16  i 

{  17 

(  33 

:           649  i 

t  10,384 

s  11,033 

:  21,417 

16.  1 

I  18 

(  34 

s            11  - 

!  176 

t         198      2  374 

i  2 

4 
t 

!  Total 
J    Mean  5/ 

!      16,977  ] 
t 

246,772  ; 
J    14.536  j 

;  249,127  ; 

!  14.674 

;  495,899 
!    14.605  6/ 

1/  Selected  from  unpublished  data.    Press-box  samples  are  those  taken  from 
the  gin  press-box  while  the  bales  were  in  process  of  being  ginned,  whereas 
out  samples  are  those  taken  from  the  same  bales  after  they  had  been 
pressed  and  tied. 

2/ Representing  staple-length  designations  of  the  press-box  samples.  (See 
footnote  1.) 

z/  Representing  staple-length  designations  of  the  cut  samples. (See 
footnote  1 • ) 

4/  The  figures  in  this  column  represent  the  number  of  times  each  pair  of 
observations  occurs.    For  example,  the  paired  observations  of  13  in  the 
X  coliimn  and  13  in  the  y  column  occur    499  times.    The  total  of  1,328 


(continued  on  next  page) 
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4/  (continued)      (sum  of  499,  742,  75,  and  12)  designated  as'^lS/ie-inoh 
(1st  and  4th  columns)  according  to  classification  of  the  press-box 
sajnples  was  distributed  among  4  staple  lengths  (2nd  and  4th  columns) 
according  to  classification  of  the  cut  samples.    Distributions  of 
other  lengths  are  shown  by  the  arrangement  of  paired  classifications, 

5/  Calculated  for  comparative  purposes  on  the  lower  limits  of  the  staple- 
length  groups.    Means  based  on  midpoints  may  be  obtained  by  adding  0,500, 

6/    Representing  the  common  mean  of  the  two  series  of  observations. 
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1.  Correction  factor  =  495,899  x  .14.605  =  7,242,605 

2.  Total  squared  variability  =  3,599,830  f  3,670,767  -  7,242,605  =  27,992 

(246,772)^     ,  (249,127)^ 
3«  Bias,  or  X  -  y  squared  variability,  =    ■'    1^^977         7       TSfi'TT'^  ^ 

7,242,605  =  175 

14  '5'?'^  6Sl 

4.  "Saii^)le"  variability  =       *  |  *         -  7,242,605  =  24,211 
5*  Error  s  27,992  -  24,211  -  175  =  3,606 

Having  the  total  squared  variability  separated  into  designated 
component  parts,  each  containing  as  many  part^  of  average  squared 
variability  for  error  as  are  indicated  by  the  corresponding  degrees  of 
freedom  (Column  2,  table  4),  it  is  now  possible  to  proceed  with  the 
determination  of  whether  or  not  one  estimate  of  squared  variability 
obtained  from  nj  degrees  of  freedom  differs  significantly  from  another 
estimate  of  squared  variability  obtained  from  n2  degrees  of  freedom. 
This  is  done,  just  as  in  the  case  of  calculations  represented  in 
table  2,  by  deriving  the  z  value  equal  to  half  the  difference  between 
the  natural  logarithms  of  the  two  measures  of  average  squared  variability, 
or  to  the  difference  between  the  natural  logarithms  of  the  corresponding 
standard  deviations  (i*e*,  the  difference  between  the  natural  lo^rithms 
of  the  square  root  of  the  measiires  of  average  squared  variability).  The 
^values  in  column  4  of  table  4  are  measures  of  average  squared  variability 
and  they  may  be  referried  to  conveniently  as  variances.    Values  in 
column  5  are  the  products  obtained  by  multiplying  the  values  in 
column  4  by  10,  which  is  the  equivalent  of  moving  the  decimal  points 
one  place  to  the  right 
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Table  4.  -  Squared  variability  contributed 
from  specified  sources 


1  J 

!           2           s           3  1 

t           4            s           5  J 

I  6 

Source  of  i 
squared  ; 
variability  j 

{  Degrees 

I  of 

I    freedom  j 

:  Squared  i 
t  variability  j 
{  (summation  j 
{  of  squares)  j 

K                                                                              •  < 

>                                                                              9  1 

t  Averagre  t  Average  j 
t  squared  s  squared  j 
i  variability  :  variability  j 

t  j/  s  ^^^^  10  2/  s 
i                       :                    .  : 

t  i  log.g  3/ 

> 

{                     1  ! 

\  i 

i      16,976  i 

k  * 

t  i 

i      16,976  J 

I      -       175  J 

1  < 

I  24,211  : 
>  < 

t         3,606  J 

I  t 

k                                               •  « 

ft                                                                      «  4 

t      175.00        :    1750.00  j 

1                                                                   .  •  t 

!  1.43  s  14.30  J 
>  •  « 
{           .21       :         2.10  J 

i                        t                     .  i 

I  3.7337 
{  1.3301 
\  .3710 

[      33,953        !       27,992      |                         |  — .  * 

)                        •                        •                        •  s 

l/  Squared  variability  divided  by  degrees  of  freedom. 

Z/  Decimals  moved  one  place  to  the  right  to  avoid  negative  logarithms  in  the 
calculation  of  values  in  column  6.    Note  that  in  table  2  the  negative 
logarithm  of  average  squared  variability  for  error  was  used.    A  paper 
entitled  "Negative  Logarithms,**  and  dated  May  13,  1932,  has  been  prepared 
by  Norma  L.  Goudy.    This  paper  is  available  in  mimeographed  form. 

3/  J  log.Q  equals  J  log.,Q  timeB2.3026,  or  log.^^  times  1.1513.    These  values 
were  calculated  by  obtaining  the  products  of    2.3026  and  one-half  of  the 
common  (five-place)  logarithms  of  the  numbers  in  column  5. 


The  z  value  is  the  difference  between  any  two  of  the  one-half 
natural  logarithms  in  column  6.    Since  the  problem  is  concerned  first 
with  the  difference  between  bias  and  error,  the  desired  z  value  is  cal- 
culated by  subtracting  0.3710  from  3.7337,  which  leaves  3.3627.    In  the 
table  showing  5  percent  points  of  the  distribution  of  2, 13/  with  n^ 
equaling  1  and       equaling  infinity,  the  z  value  is  0.6729,  indicating 


13/  Fisher,  R.  A.,  Statistical  Methods  for  Research  Workers,  fourth 

edition,  1932,  table  VI,  pages  224  and  225.    See  pages  226  and  227 
for  1  percent  points  of  the  distribution  of  z.    (Refer  to 
footnotes  8  and  9.) 
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that  a  value  of  z  as  great  as  or  greater  than  0.6729  would  be  expected 
to  be  obtained  by  chance  alone  in  not  more  than  5  percent  of  the 
number  of  cases.    With       equaling  1  and  ng  equaling  60,  the  z  value 
occurring  at  the  5  percent  point  is  0«6933.    In  the  table  showing 
1  percent  points  of  the  distribution  of  z,  with  n^^  equaling  1  and  n2 
equaling  infinity,  the  z  value  is  0.9462,    It  is  obvious,  therefore, 
that  the  z  value  of  3,3627  indicates  the  difference  between  the  two 
measures  for  bias  and  error  in  table  4  to  be  very  highly  significant. 

In  order  to  obtain  percentages    representing  proportionate 
contributions  made  to  total  squared  variability  from  the  different 
detected  sources,  it  is  first  necessary  to  free  the  designated 
component  parts  of  squared  variability  from  error.    The  separation  of 
error  from  the  desigiiated  parts  of  squared  variability  other  than 
error  is  accomplished  by  obtaining  the  products  of  the  average  squared 
variability  measure  for  error  and  the  degrees  of  freedom,  and  then 
making  the  proper  subtractions. 

Numerical  measures  of  average  squared  variability  are  expressed 
in  only  two  decimals  in  column  4  of  table  4,  These  measures  have  been 
recalculated  and  carried  to  four  decimals.  They  are  presented  in 
column  4  of  the  following  table,  the  first  three  columns  of  which  were 
adapted  from  table  4.  Following  the  table  are  calcula.tions  indicating 
the  procedure  by  which  error  is  separated  from  the  measures  in 
column  3« 
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Table  5.  -  Sources  and  measures  of  squared 
variability 


1  : 

2  i 

i                3  i 

:  4 

Souroe  of  : 
squared  : 
variaDiiixy  i/  : 

s 

Degrees  : 
of  J 
ireecioiji  i/  s 

i    Squared  : 
{    variability  l/  j 
i     ^  suumei  uion  oi  ; 
i    squares)  j 

{  Average 
i  squared 
t  variability 

• 
• 

1  i 

{                175  i 

8  175.0000 

• 
• 

• 

16,976  J 

t           24,211  j 

t  1.4262 

• 

3 

16,976  J 

t            3,606  J 

{  .2124 

I 

• 

33,953  ; 

;      27,992  ; 

1  ' 

1 
1 

1 

K 

1/  Adapted  from  table  4. 


Bias,  free  from  error,  s  175  minus  (l  times  0.2124) 
Sample  variability,  free  from  error,  z    24,211  minus 

(16,973  times  0.2124) 
Error  s  3606  plus  (16,976  times  0.2124)  plus 

(1  times  0.2124) 

Total  s 

The  proportionate  contributions  nade  to  the  total  from  each 
detected  souroe  are  calculated  by  dividing  27,992  into  each  of  the 
derived  measures  following  the  t^ble.    By  this  procedure  there  is 
obtained  0«63  percent  for  bias,  73.61  percent  for  sample,  and  25.73 

percent  for    error.     When  the  variability  contributed  from  the  different 

sources  is  evaluated  in  this  way,  the  bias  is  relatively  small  in  magni- 
tude. 


174.7876 

=  20,605.2976 

'  7,211.9148 
27,992.0000 
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